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Abstract 


Cross-correlating the data on neutral hydrogen (H I) 21 cm intensity mapping with galaxy surveys is an effective 
method to extract astrophysical and cosmological information. In this work, we investigate the cross-correlation of 
MeerKAT single-dish mode HI intensity mapping and China Space Station Telescope (CSST) spectroscopic 
galaxy surveys. We simulate a survey area of ~300 deg? of MeerKAT and CSST surveys at z — 0.5 using Multi- 
Dark N-body simulation. The PCA algorithm is applied to remove the foregrounds of H I intensity mapping, and 
signal compensation is considered to solve the signal loss problem in H I-galaxy cross power spectrum caused by 
the foreground removal process. We find that from CSST galaxy auto and MeerKAT-CSST cross power spectra, 
the constraint accuracy of the parameter product Qy by 1H 1, can reach ~1%, which is about one order of 
magnitude higher than the current results. After performing the full MeerKAT H I intensity mapping survey with 
5000 deg? survey area, the accuracy can be enhanced to <0.3%. This implies that the MeerKAT-CSST cross- 
correlation can be a powerful tool to probe the cosmic H I property and the evolution of galaxies and the Universe. 


Key words: (cosmology:) large-scale structure of universe — (cosmology:) cosmological parameters — Cosmology 


1. Introduction 


Probing the large-scale structure (LSS) of the Universe has 
always been one of the main missions of cosmological 
observations. Constraining the property of dark matter and 
dark energy, recovering the primordial fluctuations and testing 
gravity theories are all in need of cosmological surveys with 
large survey area and wide redshift coverage. To achieve this 
target, line intensity mapping (LIM) has been proposed and 
proven to be an efficient technique. LIM makes use of the 
emission lines from the energy level transition of atoms or 
molecules, such as HI 21 cm, C II, CO, Lya, Ha, [O m], etc. 
(see, e.g., Visbal & Loeb 2010; Carilli 2011; Gong et al. 2011; 
Lidz et al. 2011; Gong et al. 2012, 2013; Silva et al. 2013; 
Gong et al. 2014; Pullen et al. 2014; Uzgil et al. 2014; Silva 
et al. 2015; Gong et al. 2017; Fonseca et al. 2017; Gong et al. 
2020). These lines can reflect different properties and 
progresses of galaxy evolution, and can be good tracers of 
the LSS. 

Instead of the traditional observations targeting the resol- 
vable sources, intensity mapping probes accumulative intensity 
of all sources in a spatial volume (voxel) defined by survey 


spatial and frequency resolutions. So even though some sources 
are too faint to be detected in traditional sky surveys, in 
principle, their signals can be probed in intensity mapping. In 
addition, the frequency shifts of the emission lines are a natural 
probe of redshift, so intensity mapping is expected to be a 
powerful tool to obtain cosmic 3D matter structure information 
traced by emission lines from galaxies with high efficiency and 
relatively low cost. Among various emission lines, the HI 
2] cm line from atomic hydrogen is the most widely studied in 
intensity mapping research (see e.g., Chen 2011, 2012; Battye 
et al. 2013; Dickinson 2014; Newburgh et al. 2014; Bandura 
et al. 2014; Santos et al. 2015; Smoot & Debono 2017; Wang 
et al. 2021; Cunnington et al. 2023; Deng et al. 2022; Zhang 
et al. 2022; Spinelli et al. 2022; Perdereau et al. 2022). Besides 
being a main probe of epoch of reionization, the neutral 
hydrogen 21 cm line has a tight connection with star formation 
and galaxy evolution, and it can trace the galaxy and hence 
dark matter distribution at low and high redshifts. 

While many experiments about HI intensity mapping have 
been proposed or are already running, the foreground 
contamination problem is still one of the biggest challenges, 
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as the foregrounds can be as large as five orders of magnitude 
higher than the signal. The high brightness temperature of the 
Galactic emission and other sources makes the H I signal hard 
to be detected from auto-correlations. In principle, cross- 
correlating the 21 cm observation with an optical galaxy survey 
in the same survey area is a good method to reduce the 
foreground contamination and instrumental noise, and extract 
the signal (e.g., Chang et al. 2010). The signal-to-noise ratio 
(SNR) can be significantly improved since the foregrounds and 
instrumental noise of different wave bands in different surveys 
are barely correlated. 

However, in practice, the cross-correlation result is not fully 
satisfied due to the complex components of the foreground. So, 
the foreground removal algorithms are still needed in cross- 
correlations. Various algorithms have been applied, including 
the blind foreground removal techniques like principal 
component analysis (PCA) (Davis et al. 1985) and independent 
component analysis (ICA) (Wolz et al. 2014) which make use 
of different frequency smoothness of foreground and signal, the 
polynomial/parametric-fiting method which fits the physical 
properties of the foreground (Bigot-Sazy et al. 2015), machine 
learning (ML) methods (Li & Wang 2022), etc. Although 
signal loss and foreground residual are usually inevitable, 
foreground removal techniques do make progress and are 
necessary in cross-correlation detection. 

Currently, positive results on HI abundance and H I-galaxy 
correlation have been obtained by several experiments. The 
Green Bank Telescope (GBT) implemented their H I intensity 
mapping correlation detection with the Deep2 optical redshift 
survey (Chang et al. 2010), WiggleZ Dark Energy Survey 
(Masui et al. 2013) and eBOSS survey (Wolz et al. 2022). In 
addition, the Parkes radio telescope also presented their work 
on correlating HI intensity mapping with the 2dF galaxy 
survey (Anderson et al. 2018). Recently, MeerKAT accom- 
plished HI intensity mapping correlation detection with the 
WiggleZ survey (Cunnington et al. 2023). They all constrain 
the H I-galaxy correlation parameter product Qy jbg iru r4 at 
different redshifts, where Qy 1, by ; and rg 1, are the H I energy 
density parameter, H I bias, and correlation coefficient of HI 
and galaxy, respectively. In this work, we will determine the 
constraint power on neutral hydrogen parameters by the 
observations of MeerKAT and the next-generation galaxy 
survey of China Space Station Telescope (CSST). 

MeerKAT is a pathfinder project of the Square Kilometre 
Array (SKA) and in the future will become a part of SKA-mid 
(Santos et al. 2017; Bacon et al. 2020). It is a state-of-the-art 
intensity mapping instrument which is capable of complement- 
ing and extending cosmological measurements at a wide range 
of wavelengths. While MeerKAT is a large interferometric 
array which can access small scales of cosmic structure, single- 
dish mode is preferred in intensity mapping experiments. We 
plan to perform MeerKAT HI intensity mapping cross- 
correlation with the China Space Station Optical Survey 
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Table 1 

Simulation Parameters for SMDPL 
Parameters Values 
Lioz 400 Mpc/h 
N, 3840° 
M, 9.63 x 107 Mgun/h 
€ 1.5 kpe/h 
h 0.6777 
Qu 0.307 
Op 0.048 
Qa 0.693 
ns 0.96 
7g 0.8228 


(CSS-OS) (Zhan 2011; Cao et al. 2018; Gong et al. 2019; 
Zhan 2021). CSS-OS is the major observation project of CSST, 
and it will cover 17,500 deg? of sky area in a 10 yr working 
time. In addition, the spectroscopic survey of CSS-OS will 
provide a large amount of data in the form of a galaxy catalog 
with verified redshift using slitless gratings. CSST is planned to 
start its observation around 2024, while MeerKAT will still be 
working full-time before SKA which will begin full operations 
in 2028, and these two surveys will have a large overlapping 
survey area. Thus we believe MeerKAT H I intensity mapping 
and CSST galaxy survey would make promising cross- 
correlation detection in the coming future. 

This paper is organized as follows: in Section 2, we 
introduce our method of creating mock data of MeerKAT HI 
intensity mapping and CSST spectroscopic galaxy surveys; in 
Section 3, we apply the PCA algorithm to remove the 
foreground in HI intensity map; in Section 4, we calculate 
the galaxy auto and H I-galaxy cross power spectra, and discuss 
the signal compensation method for cross power spectrum; in 
Section 5 we forecast the constraints on relevant cosmological 
parameters; we conclude our work and provide discussion in 
Section 6. 


2. Mock Data 


We generate MeerKAT intensity maps and CSST spectro- 
scopic galaxy survey data using MultiDark cosmological 
simulations (Klypin et al. 2016). MultiDark is a suite of N- 
body cosmological simulations which have been carried out by 
L-GADGET-2 code. Most simulations of this suite have 3840? 
particles, with box sizes ranging from 250 Mpc/h to 2500 
Mpc/h. Based on the survey area and redshift of the MeerKAT 
observation plan, the Small MultiDark Planck simulation 
(SMDPL) has been chosen in this work. The box size of 
SMDPL is 400 Mpc/h, and halos in SMDPL boxes are 
identified through the halo finding code Friends-of-Friends 
(FOF) with relative linking length of 0.2 (Davis et al. 1985). 
The relevant simulation and cosmological parameters that 
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Figure 1. The H I mass My ; and halo mass M relation we use at z — 0.5 (green 
curve), which is derived from the relations at z = 0 (blue dashed curve) and 
z = 1 (orange dashed curve) given in Villaescusa-Navarro et al. (2018). 


SMDPL adopted are listed in Table 1, and its halo catalog can 
be acquired from the CosmoSim database.? 

In our work, we focus on the cosmology at z — 0.5, which is 
one of the main observational target redshifts for both CSST 
and the MeerKAT L-band. So, our mock data are generated 
from the snapshot70 of SMDPL, whose redshift z ~ 0.5. We 
also find that the 400 Mpc/A box size of the snapshot70 
corresponds to a survey of ~297 deg? at z = 0.5. Note that the 
non-flat sky effect may need to be considered for a ~300 deg? 
sky coverage, but for simplicity, we still use the flat sky 
approximation in our mock data analysis. 


2.1. HI Intensity Mapping with MeerKAT 


Since HI could only survive from ultraviolet (UV) radiation 
in dense clumps in galaxies after the epoch of reionization, we 
assume that HI can only exist in halos hosting galaxies at 
z=0.5. We place the HI mass in the center of a halo, as has 
been proven to be reasonable in previous studies (see, e.g., 
Villaescusa-Navarro et al. 2018). Under this assumption, we 
construct a catalog applying the halo HI mass function given 
by Villaescusa-Navarro et al. (2018), and it takes the form 


a 0.35 
Mist, = M(H) al (4) | (1) 


Here M is the halo mass, and we have three free parameters, 
i.e., a, Mo and Mnin, which determine the shape of the fitting 
curve at different redshifts. In order to get the values of these 
three parameters at z= 0.5, we perform interpolation on the 
fitting values at z= 0 and 1 given in Villaescusa-Navarro et al. 
(2018). Then we find that a = 0.42, Mp = 2.50 x 10!? n^! Mo, 
Mmin = 1.13 x 10? 57! Mo at z= 0.5. In Figure 1, we plot the 


? The data are available at https:/ /www.cosmosim.org/. 
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My;—M relation at z —0.5 (green solid curve), and the 
relations at z — 0 (blue dashed curve) and 1 (orange dashed 
curve) from Villaescusa-Navarro et al. (2018) are also shown 
for comparison. 

Then we can calculate the HI energy density parameter Qy 1, 
which is expressed as 


WO = fno, JMM, dM, Q) 
where po, is the critical density of the present Universe, and n 
(M, z) is the halo mass function (Sheth & Tormen 1999), which 
can be derived from our simulation. We find that 
Oy += 6.73 x 107^ in our simulation, which agrees with the 
estimation of Q5, — z relation given in literatures (see, e.g., 
Villaescusa-Navarro et al. 2018). 
Next, we can create the map of HI brightness temperature. 
The brightness temperature field ôy traces the underlying matter 
fluctuations 6,, as 


ór(r, z) = T(z)bui()6n(r, z), (3) 


where T(z) is the mean H I brightness temperature at z, and by 1 
is the H I bias, which can be estimated by 


[n«M. z)b(M, 2Mu(M, z)dM 


b = 
any {nM z)Mm (M, 2)dM 


(4) 


Here b(M, z) is the halo bias. So for a voxel with position on 
the sky r and redshift z, its HI brightness temperature can be 
derived as 


h 
E(z) 
= To(z) x Qu(r, z), (5) 


where E(z) = H(z)/ Hy represents the evolution of the Hubble 
parameter, and Tọ is a redshift dependent parameter which is 
defined as 7) = 189hE(z) 1-- z)?. Then Tj(z) can be 
estimated by averaging Ty(r, z) at different positions in the 
simulation box. At z= 0.5, we find that the corresponding 
mean HI brightness temperature is 7j = 0.145 mK in our 
simulation. The brightness temperature of the H I distribution 
(right panel) and the corresponding dark matter distribution 
(left panel) in the simulation are shown in Figure 2. 

After obtaining the HI brightness temperature in the 
simulation box, our next step is to create the HI intensity 
maps with MeerKAT instrumental parameters and observa- 
tional effects. Since the observable of H I intensity mapping is 
the HI brightness temperature of each voxel in the survey 
volume, we divide the survey volume (here this means our 
simulation box) into voxels that MeerKAT can observe. The 
details are as follows: 


T(r, z) = 189 Qu, z)(1 + z? [mK] 


1. To divide frequency bins along the line of sight (LOS), 
we put the center of the box at z — 0.5. As the box length 
of 400 Mpc/h is known, the redshift range of the survey 
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Figure 2. (Left) The dark matter distribution at z = 0.5 in the simulation. (Right) The corresponding map of H I brightness temperature. We can see that the H I map 


has similar structure as dark matter, and can be a tracer for the LSS. 


volume can be calculated. We find that the redshift range 
of snapshot70 is 0.415 ~ 0.590, corresponding to the 
observed H I frequency of 1004.14 MHz ~ 893.30 MHz. 
This frequency range can be observed by the MeerKAT 
L-band with frequency resolution of 0.2 MHz, and it 
allows us to divide the survey volume into 554 bins, such 
that each bin width is about 0.72 Mpc/h. For simplicity, 
we assume that there is no redshift evolution in this range. 

2. As for the pixels perpendicular to LOS, since we plan to 
use the single-dish mode observation, resolution @ is 
defined by the full width at half maximum (FWHM) of 
the beam of an individual dish. Then the beam size or 
spatial resolution is given by 


= 10229. (6) 
Daish 


where A op; is the observed wavelength, and Daisn is the 
dish aperture diameter. We find that the spatial resolution 
of MeerKAT at z= 0.5 is 1.36 deg. Since the size of a 
simulation box is 400 x 400 (Mpc/ hy, corresponding to 
a 297 deg? survey area, the number of pixels in an HI 
map is found to be 12 x 12 for MeerKAT single-dish 
mode observation. We note that the current spatial 
resolution given by the FWHM of the beam is a choice 
of simplicity, and more realistic resolution will be 
considered in the future work. Moreover, since the beam 
size actually changes with frequency, it can introduce 
more complexity and challenges into the foreground 
subtraction. However, because our simulation snapshot 
has no redshift evolution, for simplicity, we do not 
consider the frequency dependence of the beam size, and 
set the pixel size of all the maps to be the same. 


The HI signal intensity map obtained by MeerKAT at 
z= 0.5 is displayed in the upper left panel of Figure 3. In real 
observation, the HI intensity mapping will be contaminated by 
different components, such as system thermal noise, fore- 
ground emission from the Milky Way, radio frequency 
interference (RFI), etc., which can lower the SNR. Here we 
model the system thermal noise of a single-dish as Gaussian 
noise. Its root mean square (rms) noise temperature can be 


calculated as (Bull et al. 2015) 
VAs/8, (7) 


Tys X 
NIA 02A, 
where óv is the frequency interval, ftot is the total observation 
time of the survey, A, is the effective collecting area of a dish, 
As is the survey area and T,,, is the system temperature which 
is usually described as a combination of four components, 
yielding 


Tyys = sky (V) T Tspill + Tum + Tec- (8) 


The mean sky temperature can be approximated by 
T4,:2.,25 + 16(4/GHz) ^^. and Ton, Tam and Te 
represent spillover temperature, atmosphere temperature and 
receiver temperature, respectively. The values of these 
parameters we adopt are listed in Table 2, and then we obtain 
O7 — 0.102 mK at v = 946.7 MHz (z = 0.5). The corresponding 
map of Gaussian system noise is shown in the upper right panel 
of Figure 3. 

The foreground emission from the Milky Way is actually the 
main challenge to HI intensity mapping. The brightness 
temperature of foregrounds can be more than 4 orders of 
magnitude brighter than HI signal, so its effect has to be 
seriously taken into account in our forecast of MeerKAT 
observation. Here we generate the foreground emission using 


202307.00716v1 


chinaXiv 


Research in Astronomy and Astrophysics, 23:075003 (12pp), 2023 July 


iz 
m 
o 


HI Brightness Temperature [mK] 


157.5 160.0 162.5 165.0 1675 170.0 
RA. 


Foreground emission[K] 


6.0 
10 
5.8 
8 
6 5.6 
"ES 54 
m 
g 
a 
2 
52 
o 
-2 5.0 
3E 48 
160 162 164 


Jiang et al. 


Dec. 
Systematic Noise[mK] 


B 


0.4 
10.0 0.3 
7.5 0.2 
5.0 0.1 
0.0 
2.5 
-0. 
0.0 
-0.2 
-2.5 
-0.3 
-5.0 
-0.4 


155.0 157.5 160.0 e us 5 1650 1675 170.0 


6.0 
10.0 
5.8 
75 
5.6 
5.0 
5.4 
x 
2.5 E 
5.2 
0.0 
.0 
-25 > 
-5.0 4.8 


155.0 157.5 160.0 RAS 5 165.0 167.5 170.0 


Dec. 


Figure 3. The mock MeerKAT intensity maps for the central slice in the simulation at v — 946.7 MHz or z — 0.5. The upper left panel is the signal map of HI 
brightness temperature. The upper right panel features the map of Gaussian system noise. The bottom left panel is the total foreground map generated by GSM2016 at 
v = 946.7 MHz, including Galactic synchrotron emission, free-free emission, cold and warm dust thermal emission, the cosmic microwave background (CMB) 
anisotropy, and Galactic H I emission. The bottom right panel depicts the total sky map observed by MeerKAT containing all components we consider. 


Table 2 
Specifications of the MeerKAT Observations 
Parameters Values 
Antennas All 64 MeerKAT dishes 
Observation mode Single-dish 
Dish diameter 13.5m 
System temperature 20K 


856-1712 MHz 
0.2MHz 
200 hr per dish 


L-band Frequency range 
Frequency resolution 
Survey time 


the GSM2016 model (Zheng et al. 2017). GSM2016 is an 
improved model of the original GSM. It uses an extended PCA 
algorithm to identify different components in the diffuse 
Galactic emission. Six components of Galactic emission that 
match the known physical emission mechanisms are obtained, 
i.e., synchrotron emission, free-free emission, cold and warm 
dust thermal emission, the CMB anisotropy and Galactic HI 


emission. This algorithm allows it to make use of 29 sky maps 
from 10 MHz to 5 THz, and do interpolation to produce a full- 
sky map of any frequency in this frequency range. 

To apply the foreground model on our HI map, the 
coordinates of the survey area have to be set. According to 
the previous work (Wang et al. 2021), we set our survey area at 
153238 < R.A. «170762 and —5°62 < decl.<11°62, mostly 
intersecting with the WiggleZ 1lhr field (Drinkwater et al. 
2010) (Drinkwater et al. 2018). Note that this choice of survey 
area is only for discussion here, since this area has relatively 
low Galactic emission in the full-sky map. For future 
observation of MeerKAT-CSST cross-correlation, the target 
survey area can be chosen from anywhere in the overlapping 
region of the MeerKAT and CSST survey area with relatively 
low foreground emission. We generate foreground maps at 
each frequency bin, and then interpolate them to the center of 
each voxel. 

The foreground map for the survey area at v — 946.7 MHz 
(z — 0.5) is shown in the lower left panel of Figure 3. We find 
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that the foreground contamination we consider is about 4 
orders of magnitude brighter than the HI signal. After 
combining the HI signal, foreground emission and system 
noise maps, we obtain the total sky map observed by 
MeerKAT. The mock total observational map is displayed in 
the lower right panel of Figure 3. Since the HI signal has 
totally drowned in the contamination, the foreground con- 
taminant subtraction algorithms have to be applied to extract 
the HI signal. We will discuss the foreground removal method 
in the next section. 


2.2. Galaxy Survey with CSST 


We use the same simulation data SMDPL snapshot70 to 
create the mock data of CSST spectroscopic galaxy survey. We 
utilize Python package Halotools'® to generate a galaxy 
distribution for each dark matter halo in the simulation. First, 
the structure of a cold dark matter (CDM) halo can be described 
by an NFW profile (Navarro et al. 1996). The halo 
concentration-mass relation under the NFW profile is fitted 
by Dutton & Macció (2014) 


logig Cvir = a + blogo(Mys/[10? h! Mo), (9) 


where M, is the halo virial mass, c,j;, is the concentration of 
the corresponding halo, and a and b are the fitting parameters, 
which are expressed as 


a = 0.537 + (1.205 — 0.537)exp( —0.718z! 98), (10) 
b = —0.097 + 0.024z. (11) 


After obtaining the halo concentration, the halo occupation 
distribution (HOD) model can be applied to get galaxy 
distribution. The HOD model can determine the population 
of the central galaxy and satellite galaxies in a given halo. The 
central galaxy occupation statistics are given by Zheng et al. 
(2007) 


] M-1 Mmi 
(Neen) = fi + «( Oio Sio min J (12) 


Togig M 


and central galaxies are assumed to reside at the centers of their 
host halos. On the other hand, the distribution of satellite 
galaxies is written as 


(13) 


When redshift, cosmological model and the threshold of galaxy 
absolute magnitude are set, the values of the parameters in 
Equations (12) and (13) are calculated by the Halotools 
package based on the model published in Zheng et al. (2007). 
At z — 0.5, we set the threshold of galaxy absolute magnitude 
to be —19.5, and then these parameters are expressed as 


10 https: / /halotools.readthedocs.io / 
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logo Mmin = 11.35, Ojog,M = 0.28, log; Mo = 11.69, 
logo; M; = 13.01 and a= 1.06. 

The final step in generating a galaxy catalog is to determine 
which galaxies can be observed by CSST. We assign 
luminosity to galaxies using the relation between host halo 
mass and galaxy luminosity, which is given by Vale & Ostriker 
(2008) 


0.88 
M 
L group = {4} h? Lo, (14) 
(M/M^y E, 


Leen = L 
"T+ (M/M: 


Lo, (15) 


and for simplicity, if assuming all satellite galaxies have the 
same luminosity, we have 


1 
La — = 
Lu)" — 


sat 


(Lgroup m Leen). (16) 


Here Leroups Leen and Lsa are the luminosities of a galaxy group, 
central galaxy and satellite galaxy, respectively. Nsa is the 
number of satellite galaxies in a galaxy group. The parameter 
values are chosen to be La= 0.3 Lo Lo = 2.8 x 10° L (Zheng 
et al. 2007), M’ = 3.7 x 10? h! Ms, a= 29.78, b = 29.5 and 
c — 0.0255 (Vale & Ostriker 2008). Then the galaxy luminosity 
can be converted to the magnitudes in CSST spectroscopic 
bands. Since the magnitude limit of the CSST spectroscopic 
survey is ~23 mag (Gong et al. 2019; Zhan 2021), galaxies 
whose magnitude is under this limit can be selected to form the 
CSST spectroscopic galaxy survey catalog. After the selection, 
we find that the galaxy number density in the simulation box is 
9.07 x 10? (Mpc/h) ^, which is in good agreement with the 
result in previous works (e.g., Gong et al. 2019). The mock 
map of the CSST spectroscopic galaxy survey at z — 0.5 is 
depicted in Figure 4. 


3. Foreground Removal 


The signal extraction of H I intensity mapping highly relies 
on foreground removal efficiency. Theoretically, cross-correla- 
tion with other tracers (e.g., galaxies and other emission lines) 
could be a good way to extract the H I signal and reduce the 
effects of foregrounds and system noise, since the foregrounds 
and instrumental noise of different wave bands in different 
surveys should be uncorrelated. However, it is found that it will 
be problematic if directly cross-correlating the raw intensity 
map with other surveys. In Figure 5, we show the results of our 
mock MeerKAT H I raw (blue data points) and signal (red data 
points) maps cross-correlated with the mock CSST spectro- 
graphic galaxy map. We can see that the effect of foreground 
contamination is still too huge to extract correct cosmological 
information, as there is large deviation at all scales between the 
two curves. This indicates that extra foreground removal 
methods should be performed before cross-correlation. 
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Figure 4. The mock galaxy map observed by CSST spectroscopic survey at 
z — 0.5, which is in the same survey area as the MeerKAT H1 intensity 
mapping survey. 
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Figure 5. The cross-correlation power spectra of the MeerKAT H I raw (blue 
data points) and signal (red data points) intensity maps with the corresponding 
CSST spectroscopic galaxy map. The missing data points in the raw cross 
power spectrum (blue data points) have negative or small values, so they are 
not shown in the figure. 


Many methods of foreground removal have been discussed 
in previous works. These include blind foreground subtraction 
algorithms, such as PCA and Singular Value Decomposition 
(SVD) (Davis et al. 1985; Paciga et al. 2011; Villaescusa- 
Navarro et al. 2017; Yohana et al. 2021), ICA (Wolz et al. 
2014), correlated component analysis (CCA) (Bonaldi et al. 
2006), extended ICA (Zhang et al. 2016) and FASTICA 
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(Chapman et al. 2012), non-parametric Bayesian methods, like 
Gaussian Progress Regression (GPR) (Mertens et al. 2018; 
Ghosh et al. 2020), and methods assuming some physical 
properties of the foregrounds, such as polynomial/parametric- 
fitting (Bigot-Sazy et al. 2015; Alonso et al. 2015). Here we use 
the PCA/SVD algorithm to perform foreground removal. Since 
the PCA method is based on identifying different correlations 
of corresponding components in the frequency domain, in 
principle, this method can distinguish the foregrounds from the 
HI signal by their different frequency smoothness. Besides, 
PCA does not require much knowledge about the models of 
data components, which is suitable in our case. On the other 
hand, SVD is a similar method that can be applied on a data 
matrix, which can obtain similar results as PCA but with fewer 
calculation steps. 

To apply our foreground removal procedure, first we 
transform the simulation result into data matrix X with 
dimensions N, x Ny. Here N, is the number of frequency 
channels, and N, is the number of pixels in a frequency channel 
of the intensity map. Then SVD can decompose the data matrix 
X in the form 


X = WTYXR, (17) 


where W' and R are called left and right singular vectors, 
respectively, and X is a rectangular diagonal matrix of singular 
values. W" and R are unitary matrices, which are defined as 
WW*=1 and RR*=1, where asterisk denotes conjugate 
transpose. Generally, when dealing with a complex valued 
matrix X, Equation (17) takes the form of X = W*DR. But since 
our data matrix X is real, we use transposed matrix W' to 
substitute W*. 

Singular vectors W' and singular values X are equivalent to 
the eigenvectors and eigenvalues in PCA, respectively. So, we 
rank the singular vectors in decreasing order of their 
corresponding singular values to identify the principal 
components of the data matrix, i.e., the foregrounds. 

Then we compose an N, x m projection matrix W' with the 
first m columns of W, where m is the number of components 
which are thought to be foregrounds. In Figure 6, we show the 
first five principal components decomposed from the data 
matrix X. We notice that the first two components are relatively 
smooth in frequency smoothness, so they are identified as 
foreground, i.e., m — 2. The dominant principal components 
will be obtained when the data matrix X is projected onto the 
projection matrix W’ by 


U- W'T .Xx, (18) 
V2 W!-U. (19) 


Here U is the foreground information constructed from the data 
matrix. Then the H I signal can be recovered as 


Sg; — X — V. (20) 
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Figure 6. The first five principal components of data matrix X decomposed by 
the PCA/SVD method. 


At last, the recovered signal is projected back to the original 
map position for obtaining the foreground-removed map. In 
principle, the foreground-removed map is composed of HI 
signal and system noise. The effect of the PCA procedure can 
be indicated more clearly in a line intensity power spectrum as 
we discuss in the next section. 


4. Power Spectrum 


Here we introduce the process of line intensity power 
spectrum estimation. We consider the cross-correlation using 
the method based on Wolz et al. (2017). The galaxy survey and 
intensity mapping data are converted into galaxy over-density 
and brightness over-temperature contrasts respectively by 


N (xi) — (N) 
j)— —————, 21 
ôg xi) (Nj Q1) 
ór(xi) = Thix) — (Tuy). (22) 


where the angled brackets denote mean values. The Fast 
Fourier Transforms of N(x;) and Ty i(x;) are given by 


N(k) = X Nae, (23) 
T(k) = 3 5 Tii) e. (24) 


Then our estimators for the galaxy auto-correlation power 
spectrum P, and the cross power spectrum between the gridded 
galaxy distribution and intensity map P, at wavevector k are 


Pilk) = V (6,0065()) — Psn, Q5) 
PAK) = V Re(6,(k)67()]. (26) 


Here V = 4003(Mpc A7? is the survey volume and Psy is the 
shot noise term for the galaxy survey, which can be estimated 
by Psn = 1/(N). The error for the corresponding power 
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spectrum is given by Feldman et al. (1994); Wolz et al. (2017) 
2c 

4 Vk27Ak 

x JPZ% + (Pr(k) + Pn(k))(P,(k) + Psn), 


(28) 


op (k) = 


(F,(k) + Psn), (27) 


2v 
he 
V2Vk2Ak 


where Ak is k-bin width, Py is the HI brightness temperature 
power spectrum and Py(k) is the power of system noise. 

After performing the PCA/SVD foreground subtraction, we 
estimate and show the cross power spectra of CSST galaxy 
with foreground-free (blue data points) and foreground- 
subtracted (red data points) maps in MeerKAT HI intensity 
mapping survey, in the left panel of Figure 7. We can find that 
signal loss can be caused by the PCA procedure, and it 
becomes severe especially at large scales that we are interested 
in. Therefore, the over-eliminated signal must be compensated. 

We compensate the cross power spectrum based on the 
method given in Cunnington et al. (2022), and the procedure 
can be described as follows: 


1. First, we generate mock data of halos. The mock halo 
catalogs include the information on mass and position, 
which follows the same matter power spectrum and halo 
mass function as SMDPL simulation. 

2. After that we calculate the H I brightness temperature of 
mock data using the HI model and MeerKAT observa- 
tional effect. So, the mock HI intensity map data are 
obtained and further transformed into mock data matrix Y 
with the same dimensions of data matrix X. 

3. Then the mock data matrix Y is injected into the data 
matrix X. We apply the PCA clean on this data 
combination with the same projection matrix W’ we used 
in previous PCA. Then the foreground-removed mock 
data can be written as 


Y; = [Y + X]pca — Sui. (29) 


So, we can determine the signal loss of the cross power 
spectrum between Y and Yù. 
4. Finally, the transfer function is constructed as 
P(Y, Y, 
T(k) = Ta (30) 
PY, Yo) 
where P() denotes the cross power spectrum and Y, is the 
corresponding mock galaxy data. 


To compensate the signal loss, we construct the transfer 
function 7(k) by generating 100 HI intensity mapping mock 
data and corresponding galaxy survey data. The result of the 
transfer function is shown in Figure 8. 

The cross-correlation compensated by transfer function is 
displayed in the right panel of Figure 7. We can find that the 
signal compensation method we use is efficient, and that the 
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Figure 7. (Left) The MeerKAT-CSST cross power spectra of H I foreground free (blue data points) and foreground removal by PCA (red data points). (Right) The 
same as the left panel but considering signal compensation after PCA foreground removal (red data points). The corresponding relative errors are also shown in green 
dashed curves in the lower panels, where AP is the difference between the two power spectra in the upper panels. 
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Figure 8. The transfer function Z(k) estimated by 100 H I intensity mapping 

and corresponding galaxy survey mock data to compensate the signal loss after 

PCA foreground removal. 


compensated power spectrum is very consistent with the 
foreground-free power spectrum at lo. Although over- 
compensation may happen due to large variance in the low k 
range, the transfer function is reliable enough that the effect of 
signal loss can be effectively reduced. 


5. Cosmological Constraint 


After obtaining the cross power spectrum of MeerKAT HI 
intensity mapping and CSST spectroscopic galaxy surveys, we 
can explore the constraint power on cosmological parameters. 
Theoretically, the power spectra of different tracers have a 


similar relation to the matter power spectrum. The galaxy auto 


power spectrum P,(k) is related to the matter power spectrum 
PK) as 


Eo DB. (31) 


where b, is the galaxy bias. On the other hand, the relation of 
the HI intensity auto power spectrum P;(k) and the matter 
power spectrum can be written as 


Pr(k) = T? bg Pn(k) 
= Té OR bá Ba). (32) 


Here bg, is the HI bias, and T, is the mean HI brightness 
temperature in the Universe. Then the cross power spectrum 
P,.(k) is given by 


BK) = ToC bu 1bgrH isis (K), (33) 


where rug is the HI-galaxy correlation coefficient that 
indicates the correlation strength. 

Note that the parameter 7o can be absorbed into the H I bias 
by 1 (Wolz et al. 2017). Hence, in real observations, we can 
actually constrain the parameter product Qy rby 1bgrH 1,g using 
the cross power spectrum P,(k), if the matter power spectrum 
P,,(k) is known. Furthermore, the constraint on Qy yy 17H Le 
can be achieved, if b, can be properly estimated. Besides, in the 
ideal case, if the auto-correlation of H I intensity mapping can 
be detected at the same time, the correlation coefficient rg r, 
can be constrained by P./ (PPr). Unfortunately, due to 
inevitable foreground residual, experiments which aim at the 
auto-correlation of H I intensity mapping have not achieved any 
convincing result so far. Here we assume the HI auto- 
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Figure 9. The galaxy auto power spectrum of CSST spectroscopic galaxy 
survey at z — 0.5, which is derived from the simulation. 


correlation is unreachable, so that the cosmological parameter 
we can constrain by cross-correlation power spectrum is 
basically limited to Qy bu wer i; and Qu iby iru 1g if be can 
be derived in a galaxy survey. 

Since the matter power spectrum could be accurately 
calculated by a cosmological model, e.g., ACDM, and 
assuming its uncertainty is small enough that it can be 
neglected compared to the HI parameters, the theoretical 
matter power spectrum could be a proper choice for P,,(k). We 
use CAMB (Lewis et al. 2000) to generate a non-linear matter 
power spectrum P,,(k) at z=0.5. Here the values of 
cosmological parameters in CAMB are set to be the same as 
those in the MultiDark simulation (Klypin et al. 2016), and we 
can adopt the values obtained from real cosmological 
observations in the real H I intensity mapping surveys. 

In addition, b, can be estimated from galaxy auto power 
spectrum as indicated in Equation (31), and the galaxy auto 
power spectrum derived from the mock data of CSST galaxy 
survey is displayed in Figure 9. With the help of the theoretical 
matter power spectrum, the corresponding b, can be estimated 
as a function of wavenumber k as shown in Figure 10. As can 
be seen, the value of b, presents an increasing trend as the scale 
gets smaller. This is reasonable since the baryonic matter has 
more complicated physical mechanisms at smaller scales, like 
outflow feedback of galaxy and star formation process, which 
can significantly affect the density fluctuations. On the other 
hand, at linear scales, b, is shown to be a constant. We set the 
scale range to be k< 0.3 h Mpc! (Cunnington et al. 2023; 
Deng et al. 2022), and calculate the average value of b, from 
the power spectrum. The average value and error of b, are 
shown in Table 3. 

In Figure 11, we show the constraint results of 
Qu bu erie and Oy iby 1H 1g as a function of wavenumber 
k. Similar to b,, we derive the average values of these two 
parameter products in the linear scales with k «0.3! Mpc, 
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Figure 10. The galaxy bias factor b, for optical galaxies. The vertical dashed 


line is k = 0.3 ^! Mpc that signifies the boundary of the linear scales we 
consider. 


Table 3 
The Value and Error of b,, Ou jbg el rg and Og ry irure 
Parameter Average Value Error 
b, 0.9307 10.0648 
10^ Qg bu ert te 0.4531 +0.0045 
10° Qu Dir Tiig 0.4812 +0.0048 


and the results are listed in Table 3. We find that the errors are 
~1% of the average values, which mean the precision of our 
future MeerKAT-CSST survey can be one order of magnitude 
more accurate than the present experimental result (Chang et al. 
2010; Masui et al. 2013; Wolz et al. 2022). Furthermore, if 
assuming 7g 1,2 is close to | at linear scales, which is supported 
by the simulation results (e.g., see Deng et al. 2022, and our 
simulation gives ryig~0.94 at k<0.3 Mpc ! h), the 
constraint on Oyjbg,; can be accurately obtained. These 
indicate that the cross-correlation method is powerful in LIM 
surveys for cosmic neutral hydrogen, galaxy evolution and 
cosmological studies. 


6. Summary and Discussion 


HI intensity mapping is a promising technique of cosmo- 
logical detection. Although due to strong foreground contam- 
ination, H I intensity mapping auto-correlation detection is still 
facing great challenges at the present stage, H I-galaxy cross- 
correlation could be easier to achieve by extracting HI and 
cosmological information with the help of galaxy surveys. In 
this work, we have investigated the cross-correlation of 
MeerKAT HI intensity mapping and CSST spectroscopic 
galaxy survey at z © 0.5. 

We first generate mock H I intensity maps of MeerKAT and 
galaxy catalog of CSST spectroscopic galaxy survey using 
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Figure 11. The constraint results of Qy 1b wert 1g (left panel) and Qy by iri 1, (right panel) from the cross-correlation of MeerKAT H I intensity mapping and 
CSST galaxy surveys. The vertical dashed lines are k = 0.3 ^^! Mpc that indicate the boundary of the linear regime we consider. 


SMDPL N-body simulations at z= 0.5. The system noise and 
foreground emission are also taken into account when making 
the sky map. The voxel of a simulation box is divided 
according to the the frequency resolution of L-band and beam 
size of MeerKAT single-dish mode. We constructed the HI 
model to transform the dark matter distribution in the 
simulation snapshot into H I distribution. Then we calculated 
the H I brightness temperature of each voxel to get H I intensity 
maps. The galaxy survey catalog is generated based on the 
NFW profile and HOD model, and the galaxy luminosity is 
derived by galaxy luminosity-halo mass relation. Then we filter 
the galaxies with the magnitude limit of CSST spectroscopic 
survey to produce the mock galaxy data. 

We apply the PCA/SVD algorithm to remove the fore- 
grounds in MeerKAT HI intensity mapping, and cross- 
correlate the residual intensity map with the corresponding 
CSST galaxy map. The signal compensation is employed to 
solve the signal loss caused by the foreground removal process, 
and we construct the transfer function to compensate the cross 
power spectrum. After compensation, we derive the H I-galaxy 
cross power spectrum for constraining cosmological 
parameters. 

We constrain the parameter products Qy bg bra, and 
Qu ibg 1H 14 using the cross power spectrum, and find that the 
constraint accuracy can achieve ~1%, which is one order of 
magnitude higher than the current result. Besides, note that our 
simulation is in a ~300 deg? survey area, and in the future, the 
MeerKAT-CSST detection of H I-galaxy correlation can be 
performed on a much larger survey area of 5000 deg’. Then the 
constraint accuracy can be reduced to <0.3% level. This 
indicates that the cross-correlation of MeerKAT H1 intensity 
mapping and CSST galaxy survey is powerful for exploring the 
property of cosmic neutral hydrogen and the evolution of 
galaxies and our Universe. 

We also note that some assumptions in the current work still 
may be too simple which can affect the prediction of the 
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results. For example, the frequency dependency of the beam 
size may contaminate the data, which will be hard to be 
removed by the PCA/SVD foreground removal process. 
Moreover, the non-flat sky effect also should be seriously 
included, especially for the 5000 deg? MeerKAT-CSST joint 
analysis in the future. Other issues, such as the simple HOD 
model and systematics used in the work, probably need to be 
considered more carefully with more powerful simulations and 
precise H I model in future work. 
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